12 research outputs found
Oracles & Followers: Stackelberg Equilibria in Deep Multi-Agent Reinforcement Learning
Stackelberg equilibria arise naturally in a range of popular learning
problems, such as in security games or indirect mechanism design, and have
received increasing attention in the reinforcement learning literature. We
present a general framework for implementing Stackelberg equilibria search as a
multi-agent RL problem, allowing a wide range of algorithmic design choices. We
discuss how previous approaches can be seen as specific instantiations of this
framework. As a key insight, we note that the design space allows for
approaches not previously seen in the literature, for instance by leveraging
multitask and meta-RL techniques for follower convergence. We propose one such
approach using contextual policies, and evaluate it experimentally on both
standard and novel benchmark domains, showing greatly improved sample
efficiency compared to previous approaches. Finally, we explore the effect of
adopting algorithm designs outside the borders of our framework
Multi-unit Bilateral Trade
We characterise the set of dominant strategy incentive compatible (DSIC),
strongly budget balanced (SBB), and ex-post individually rational (IR)
mechanisms for the multi-unit bilateral trade setting. In such a setting there
is a single buyer and a single seller who holds a finite number k of identical
items. The mechanism has to decide how many units of the item are transferred
from the seller to the buyer and how much money is transferred from the buyer
to the seller. We consider two classes of valuation functions for the buyer and
seller: Valuations that are increasing in the number of units in possession,
and the more specific class of valuations that are increasing and submodular.
Furthermore, we present some approximation results about the performance of
certain such mechanisms, in terms of social welfare: For increasing submodular
valuation functions, we show the existence of a deterministic 2-approximation
mechanism and a randomised e/(1-e) approximation mechanism, matching the best
known bounds for the single-item setting
Grounding or Guesswork? Large Language Models are Presumptive Grounders
Effective conversation requires common ground: a shared understanding between
the participants. Common ground, however, does not emerge spontaneously in
conversation. Speakers and listeners work together to both identify and
construct a shared basis while avoiding misunderstanding. To accomplish
grounding, humans rely on a range of dialogue acts, like clarification (What do
you mean?) and acknowledgment (I understand.). In domains like teaching and
emotional support, carefully constructing grounding prevents misunderstanding.
However, it is unclear whether large language models (LLMs) leverage these
dialogue acts in constructing common ground. To this end, we curate a set of
grounding acts and propose corresponding metrics that quantify attempted
grounding. We study whether LLMs use these grounding acts, simulating them
taking turns from several dialogue datasets, and comparing the results to
humans. We find that current LLMs are presumptive grounders, biased towards
assuming common ground without using grounding acts. To understand the roots of
this behavior, we examine the role of instruction tuning and reinforcement
learning with human feedback (RLHF), finding that RLHF leads to less grounding.
Altogether, our work highlights the need for more research investigating
grounding in human-AI interaction.Comment: 16 pages, 2 figure
Learning Stackelberg Equilibria and Applications to Economic Design Games
We study the use of reinforcement learning to learn the optimal leader's
strategy in Stackelberg games. Learning a leader's strategy has an innate
stationarity problem -- when optimizing the leader's strategy, the followers'
strategies might shift. To circumvent this problem, we model the followers via
no-regret dynamics to converge to a Bayesian Coarse-Correlated Equilibrium
(B-CCE) of the game induced by the leader. We then embed the followers'
no-regret dynamics in the leader's learning environment, which allows us to
formulate our learning problem as a standard POMDP. We prove that the optimal
policy of this POMDP achieves the same utility as the optimal leader's strategy
in our Stackelberg game. We solve this POMDP using actor-critic methods, where
the critic is given access to the joint information of all the agents. Finally,
we show that our methods are able to learn optimal leader strategies in a
variety of settings of increasing complexity, including indirect mechanisms
where the leader's strategy is setting up the mechanism's rules
Riemannian tangent space mapping and elastic net regularization for cost-effective EEG markers of brain atrophy in Alzheimer's disease
The diagnosis of Alzheimer's disease (AD) in routine clinical practice is
most commonly based on subjective clinical interpretations. Quantitative
electroencephalography (QEEG) measures have been shown to reflect
neurodegenerative processes in AD and might qualify as affordable and thereby
widely available markers to facilitate the objectivization of AD assessment.
Here, we present a novel framework combining Riemannian tangent space mapping
and elastic net regression for the development of brain atrophy markers. While
most AD QEEG studies are based on small sample sizes and psychological test
scores as outcome measures, here we train and test our models using data of one
of the largest prospective EEG AD trials ever conducted, including MRI
biomarkers of brain atrophy.Comment: Presented at NIPS 2017 Workshop on Machine Learning for Healt
A Bayesian approach to analyzing phenotype microarray data enables estimation of microbial growth parameters
Biolog phenotype microarrays enable simultaneous, high throughput analysis of cell cultures in different environments. The output is high-density time-course data showing redox curves (approximating growth) for each experimental condition. The software provided with the Omnilog incubator/reader summarizes each time-course as a single datum, so most of the information is not used. However, the time courses can be extremely varied and often contain detailed qualitative (shape of curve) and quantitative (values of parameters) information. We present a novel, Bayesian approach to estimating parameters from Phenotype Microarray data, fitting growth models using Markov Chain Monte Carlo methods to enable high throughput estimation of important information, including length of lag phase, maximal ``growth'' rate and maximum output. We find that the Baranyi model for microbial growth is useful for fitting Biolog data. Moreover, we introduce a new growth model that allows for diauxic growth with a lag phase, which is particularly useful where Phenotype Microarrays have been applied to cells grown in complex mixtures of substrates, for example in industrial or biotechnological applications, such as worts in brewing. Our approach provides more useful information from Biolog data than existing, competing methods, and allows for valuable comparisons between data series and across different models
Market intermediation: information, computation, and incentives
Auctions are a major field of interest in game theory and in the wider mi- croeconomics area, reflected by recognitions such as Nobel prizes to William Vickrey and Paul Milgrom. The algorithmic game theory literature too pro- vides discussion of a wide range of different auction settings. But real-life markets are rarely comprised of a single monopolist facing buyers without alternative. We therefore explore market intermediation, in which we aim to match buyers and sellers to achieve some objective. While auctions have been well-explored in manifold variations, intermediation has received less attention in the literature. We aim to move beyond the independent, single-unit case and explore the limits of what can be achieved in more complex scenarios. In the first part, we look at a correlated-priors setting. We show that the revenue-optimal mechanism for this can be computed using a polynomial-time algorithm for one buyer and one seller. For two or more buyers we show that this problem is NP-hard, in contrast, but that truthful-in-expectation mechanisms can be computed using an LP in polynomial time for fixed number of buyers and sellers. In this setting we further discuss how market intermediation relates to classical auctions, as well as reverse auctions. Further motivating our results, we show that our discussion of market intermediation can lead back to useful results for both of these settings, giving an improved algorithm for the optimal two-bidder auction, and showing for the first time that a reverse auction behaves differently than an auction. In the second part, we consider an online intermediation setting, in which the market maker encounters an unknown sequence of buyers and sellers one at a time, with knowledge of their independent priors. We explore this from the point of view of online algorithms and competitive analysis, comparing against an offline adversary who knows the buyer-seller sequence in advance. For the general case, we show that the competitive ratio of the intermediary’s revenue grows as the square root of the number of buyers and sellers. In contrast, we consider two settings with natural restrictions; one in which the sequence is balanced, and one in which there is an upper limit on the number of items the intermediary is allowed to hold at any one time. For both these settings we show that the competitive ratio is constant. Finally, in the third part we explore multi-unit intermediation. In this, we consider one seller and one buyer each having concave valuation of a number of items. The intermediary’s aim will be to maximise welfare, while maintaining budget balance. This setting has been explored for the single- item case, along with simple reductions for divisible goods to that case. We will give a strong characterisation result as well as approximation guarantees for the multi-unit case.</p
Multi-Unit Bilateral Trade
We characterise the set of dominant strategy incentive compatible (DSIC), strongly budget balanced (SBB), and ex-post individually rational (IR) mechanisms for the multi-unit bilateral trade setting. In such a setting there is a single buyer and a single seller who holds a finite number k of identical items. The mechanism has to decide how many units of the item are transferred from the seller to the buyer and how much money is transferred from the buyer to the seller. We consider two classes of valuation functions for the buyer and seller: Valuations that are increasing in the number of units in possession, and the more specific class of valuations that are increasing and submodular. Furthermore, we present some approximation results about the performance of certain such mechanisms, in terms of social welfare: For increasing submodular valuation functions, we show the existence of a deterministic 2-approximation mechanism and a randomised e/(1 − e) approximation mechanism, matching the best known bounds for the single-item setting
Cell-specific activation of the nrf2 antioxidant pathway increases mucosal inflammation in acute but not in chronic colitis
BACKGROUND AND AIMS The transcription factor Nrf2 is a major modulator of the cellular antioxidant response. Oxidative burst of infiltrating macrophages leads to a massive production of reactive oxygen species in inflamed tissue of inflammatory bowel disease patients. This oxidative burst contributes to tissue destruction and epithelial permeability, but it is also an essential part of the antibacterial defence. We therefore investigated the impact of the Nrf2 orchestrated antioxidant response in both acute and chronic intestinal inflammation.
METHODS To study the role of Nrf2 overexpression in mucosal inflammation, we used transgenic mice conditionally expressing a constitutively active form of Nrf2 [caNrf2] either in epithelial cells or in the myeloid cell lineage. Acute colitis was induced by dextran sulphate sodium [DSS] in transgenic and control animals, and changes in gene expression were evaluated by genome-wide expression studies. Long-term effects of Nrf2 activation were studied in mice with an IL-10 (-/-) background.
RESULTS Expression of caNrf2 either in epithelial cells or myeloid cells resulted in aggravation of DSS-induced acute colitis. Aggravation of inflammation by caNrf2 was not observed in the IL-10 (-/-) model of spontaneous chronic colitis, where even a trend towards reduced prolapse rate was observed.
CONCLUSIONS Our findings show that a well-balanced redox homeostasis is as important in epithelial cells as in myeloid cells during induction of colitis. Aggravation of acute DSS colitis in response to constitutive Nrf2 expression emphasises the importance of tight regulation of Nrf2 during the onset of intestinal inflammation